Biologically inspired learning in a layered neural net 
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A feed-forward neural net with adaptable synaptic weights and fixed, zero or non-zero threshold 
potentials is studied, in the presence of a global feedback signal that can only have two values, 
depending on whether the output of the network in reaction to its input is right or wrong. 

It is found, on the basis of four biologically motivated assumptions, that only two forms of learning 
are possible, Hebbian and Anti-Hebbian learning. Hebbian learning should take place when the 
output is right, while there should be Anti-Hebbian learning when the output is wrong. 

For the Anti-Hebbian part of the learning rule a particular choice is made, which guarantees an 
adequate average neuronal activity without the need of introducing, by hand, control mechanisms 
like extremal dynamics. A network with realistic, i.e., non-zero threshold potentials is shown to 
perform its task of realizing the desired input-output relations best if it is sufficiently diluted, i.e. 
if only a relatively low fraction of all possible synaptic connections is realized. 

PACS numbers: 87.18.Sn, 84.35.+i, 07.05.Mh 



I. INTRODUCTION 

In this article we will try to contribute to the study of 
biological neural networks, in particular with respect to 
learning. Since we are interested in the basic principles 
rather than subtle biological details, we will use simplified 
models, although we do not allow for any properties that 
are unrealistic from a biological point of view. In this 
way we hope to reveal the essentials, without blurring 
the analysis with (irrelevant, biological) details. 

It is also not our aim to construct a network that is 
optimized for some particular task; our only purpose is to 
study a model that resembles an actual biological neural 
net and the way it might learn. 

In section ITT1 we define the model that we will study: 
a simple feed-forward network with one hidden layer of 
which not all possible connections are present (i.e., di- 
lution unequal zero). Each neuron of the net has three 
variables associated with it: a (fixed) threshold poten- 
tial 9, a (variable) membrane potential h and a state x, 
which is assumed to take two values only, depending on 
the fact whether the neuron fires or is quiescent. 

This article fits into a line of biologically motivated re- 
search. Chialvo and Bak Q suggested learning by pun- 
ishment only, via the release of some hormone. Heerema 
et al. derived rules for a biological neural net when 
it acts as a memory. Bosman et al. Q added this rule 
to the model of Chialvo and Bak, and, thereby, found a 
significant improvement of the network's performance. 

The dynamics of a neural net is determined by a rule 
that tells a neuron when to fire. A biological neuron fires 
if its membrane potential h exceeds its threshold 9. In the 
model of Chialvo and Bak this biological rule is replaced 
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by the rule that in each layer a fixed number of neurons, 
having the highest membrane potentials, fire. They refer 
to this rule by the name of 'extremal dynamics'. Ex- 
tremal dynamics has the drawback that the number of 
active neurons is artificially fixed, restricting the number 
of possible output states. Alstr0m and Stassinopoulos 
did not use extremal dynamics, but they ran into the diffi- 
culty that the network's activity became too small or too 
large for the network to function satisfactorily. In order 
to keep the activity at a desired level, they adapted the 
neuron threshold potentials 9 at each step of the learning 
process. 

It is one of our purposes to find a way in which a 
biological net can keep its neuronal activity around an 
acceptable level. Instead of putting in, by hand, some 
controlling mechanism, we started from four biologically 
motivated assumptions IjHI A|) , which lead to the conclu- 
sion that only the so-called Hebbian and Anti-Hebbian 
learning rules are the plausible ones. We show that the 
Hebbian learning rule fixes and strengthens the action 
of the network at the moment it is applied, while the 
Anti-Hebbian learning rule does the opposite, and, when 
applied repeatedly, will change the network's action. Wc 
conclude that Hebbian learning should be associated with 
reward, and should be applied if the network realizes the 
desired output state in reaction to its input, while Anti- 
Hebbian learning should be associated with punishment, 
and should be applied when the output of the network is 
wrong, in order to enable the network to search for better 

output ennui- 

In section lill Cl we propose a particularly simple Anti- 
Hebbian learning rule (essentially two constants) of which 
we expect, however, that it will keep the activity of the 
neural network around a desired value automatically, i.e., 
without the need of some controlling mechanism. In sec- 
tion |V| we perform a number of numerical simulations, 
the details of which are given in section llVl in order to 
verify whether our rule is capable of keeping the activ- 
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ity around a desired level. Because the Hebbian learning 
rule is already studied in 0, we will first focus, in section 
Ivl on the effect of the Anti-Hebbian learning rule. Our 
simulations show that our Anti-Hebbian learning rule in- 
deed is successful in keeping the average activity around 
a desired level, and, moreover, is very efficient in gener- 
ating different output states with adequate activities. 

Having studied the effect of the Anti-Hebbian learning 
rule we simulate, in section IVll networks using the com- 
plete learning rule, including both the Hebbian and the 
Anti-Hebbian contributions. We show that the complete 
learning rule enables the network to learn a number of 
input-output relations with an acceptable efficiency. Fur- 
thermore, it is shown that for non-zero threshold poten- 
tials, the performance is only acceptable if the network 
is sufficiently diluted. So 'cutting away' synaptic con- 
nections enhances the performance of the network. The 
observation that - in our model - some degree of non- 
connectedness is a conditio sine qua non for a properly 
functioning biological net, is one of our main observa- 
tions. 

The article closes with conclusions (section IVllfl and 
an outlook (section IVIIIfl . 

II. MODELING LEARNING 

In this section we describe our model and define quan- 
tities like activity and performance. 



A. Network model 

We consider a feed-forward neural net with one hid- 
den layer (figure 0. The net is taken to be diluted, i.e. 
not all neurons are connected to all others. There are 
connections from the input layer to the hidden layer and 
from the hidden layer to the output layer but there are 
no direct connections from the input layer to the output 
layer nor any lateral or feed-back connections. 

Let us suppose that the neural net consists of N binary 
neurons i, i = 1, 2, . . . , N. We will use the symbols I, H 
and O to refer to the Input, Hidden and Output layers. 
In order to denote that a neuron i of the net belongs to 
one of these layers we will write i G I, i G H or i G O, 
respectively. The state Xi of neuron i either is active 
(xi = 1) or is non-active (xi — 0). The potential hi, the 
difference in potential between the inner and the outer 
part of a neuron, is supposed to depend linearly on the 
activities of the incoming synaptic connections: 

hi = w ij x i (!) 

where Vt is the collection of neurons j that have an af- 
ferent synaptic connection to neuron i. This formula can 
be viewed as the defining expression for the weights Wij . 
Since the x^s are chosen to be dimensionless, the weights 
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FIG. 1: The feed-forward network has one hidden layer. The 
symbols I, H and O refer to the Input, Hidden and Output 
layers. In order to denote that a neuron i of the net belongs 
to one of these layers we will write i £ I, i 6 H or i 6 O, 
respectively. Furthermore, iVx denotes the number of neurons 
in layer X (X = I, H or O). The weights of the synapses 
connecting either the input layer with the hidden layer or the 
hidden layer with the output layer both are indicated by the 
symbol Wij . In this feed-forward network there are maximally 
NiNn possible connections from the input layer to the hidden 
layer. If only the fraction (1 — dn), < du < 1, of these 
connections to the hidden layer is actually realized, we call 
dn the dilution (of the connections with H). Similarly, do is 
the dilution of the connections from the hidden layer to the 
output layer. 

have the dimension of a potential. Let 9i be the potential 
that should be exceeded in order that neuron i becomes 
active, i.e., 

Xi = 1 if hi > 6i, x t — if hi < 6 t . (2) 

An alternative way to specify the state of neuron i as 
given by equation J2J), is to write 

Xi = Qn(hi - 0t). (3) 

The function Oh is the Heaviside step-function [Qn(y) = 
if y < and 9n(y) = 1 if y > 0], The state of the neu- 
rons of the input layer determine the state of the hidden 
layer via and ©. The state of the hidden layer, in 
turn, determines the state of the output layer. Thus, if 
the states Xi for i G I are given at each time step t n , we 
can determine the network state at every time t n , once we 
have a rule that fixes the Wy at time t„ (n — 0, 1, 2, . . .). 

The number of neurons in layer X (X — I, H or O) is 
denoted by A^x- The activity ax in layer X is defined by 

ax = ^E x *' X = I,H,0. (4) 
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The weights of the synapses connecting either the input 
layer with the hidden layer or the hidden layer with the 
output layer both are indicated by the symbol Wij. 

In this feed-forward network there are maximally 
NiNn possible connections from the input layer to the 
hidden layer. If only the fraction (1— djj), < da < 1, 
of these connections to the hidden layer is actually real- 
ized, we call dn the dilution (of the connections with H) . 
Similarly, do is the dilution of the connections from the 
hidden layer to the output layer. 



B. Learning input— output relations 

Let us denote the state of layer X by xx — 
(xxi, xx2, ■ • ■ , sxw x )- We want the network to associate 
with a particular, prescribed input state £j , chosen from 
a collection of p input states £j (/i = 1, . . . ,p), the par- 
ticular state £q, chosen from a collection of p prescribed 
output states £q (fi — 1, . . . ,p). The goal of the learning 
process is that the network will be able to generate, for all 
input patterns the correct output pattern £q. This 
will be achieved by a learning procedure, in which the 
weights are adapted, stepwise, according to some rule. 

If we present a pattern ^ to the network by setting x\ 
equal to £j , the network will respond by generating an 
output state xo which, in general, will not be equal to 
the desired output state £q . We will associate a variable 
r with each of the following two possibilities: r — 1 if the 
output is right, i.e., if xo = £q and r = if the output 
is wrong, i.e., if xo 7^ ^ n P ra ctice, we proceed as 
follows. We present the first input pattern £j to the net, 
which results in an output state xo. We keep repeat- 
ing this until xo equals £q, the output pattern to be 
associated with the input pattern £jk The net then has 
'learned' the first input-output relation (£i,£o)- Next, 
we present the second input pattern, and continue un- 
til the second input-output relation (£^,£q) has been 
learned, and so on until the p-th. input-output relation 
(£i\£o) nas been learned. While doing so, we continu- 
ously change the weights according to some r-dependent 
rule: the 'learning rule'. 

When an input-output relation has been learned, one 
or more input-output relations learned earlier may have 
been forgotten. Therefore, we start a second round, in 
which we try to learn slip input-output relations £o) 
again. In order to prevent possible effects due to a specific 
learning order, the patterns are presented to the network 
in random order: each round their order is shuffled. After 
a number of rounds all input-output relations should be 
recalled at once, i.e., input should result immediately 
in output £q for all input-output relations. 

Do learning rules exist which accomplish all this? The 
answer is positive 0, Q ■ A particular useful one is the 
one proposed in section ITTT1 



C. Performance 

In order to judge the performance of our network, we 
need some measure. 

The a priori probability P(N Q a " > ) that the state of the 
output layer x o equals (1, 1, . . . , 1, 0, 0, . . . , 0), i.e., the 
state with the first Nq^ elements equal to one and the 
remaining last No — Nq^ elements equal to zero, is given 
by 

P(N a >)=pS(l-Po) N °- N ° ) (5) 

where po is the probability that an arbitrary neuron of 
the output layer is active (0 < po < 1)- Note that the 
probability to find an output pattern with Nq^ active 
neurons at arbitrary places is also given by (JSJ). Let 
N^ (£q) be the number of active neurons in pattern £q. 
Then, the average a priori number of trials needed to ar- 
rive at the desired output pattern equals 1/P(Nq (£o))- 
If p input-output relations (£{',£q) are to be realized, 
(p = 1, . . . ,p), the average a priori number M a of trials 
needed equals 

Ji = tw<'p. (6) 

Comparing the average a priori number of trials and the 
actual average number of trials M needed to learn all 
p input-output relations we get a measure of how well 
the network performs. This leads us to define the perfor- 
mance R as the quotient 

R := MJM (7) 

with M a given by ©. Note that the performance tends 
to zero when the network is unable to learn all input- 
output relations (M — > oo) and the performance will be 
1 for an 'ideal' networkfM = M a ). 

In some models 0, E3], the number of neurons that 
fire is fixed in some way. The a priori probability 

-Pfixed^o^ (£o)) that the output state xo equals the out- 
put pattern £q is then given by 

-f1ixcd(JV (£ J) - — , (8 J 

the inverse of the number of ways that a state with Nq 
active neurons can be realized. 



D. Activity distribution 

Let us consider an arbitrary collection of M neurons, 
all of which have (independent) probabilities p to fire 
(0 < p < 1). Then the probability Q(a) that the activity 
of this collection of M neurons equals a is given by 

O(a) = — v Ma (l - v) M ^- a ) (91 

^ ( ) (Ma)\\M(l-a)}\ P 1 P) ' ( ' 
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where Ma is the number of neurons that fire (Ma is 
an integer). This formula differs from (JSJ) by a bino- 
mial factor, corresponding to the fact that there are 
M\/{(Ma)\[M(l - a)]!} possible states with the same 
activity a. We will use this formula in section Ivl when 
we compare Q(a) to the actual activity of the net. 

E. Stability coefficient 

Let us define the 'two time stability coefficient' 

7i(*n.*m) : = (2»i(tn) ~ 1) (M*m) ~ 9 i) ( 10 ) 

which is a generalization of the usual stability coefficient 
With equation JSJ it follows that, at two times t n 
and t m we have 

li{t n ,t m ) > Xi(t n ) = Xi(t m ) (11) 



li(t n ,t m ) <0=^Xi(t n )^Xi(t m ). (12) 
In particular, if n — m, we have 

7i(*n,*n)>0. (13) 

The state of neuron i does not change if 7i(t„,t m ) is 
positive. Moreover, the larger Ji(t n ,t n ), the more stable 
the system is with respect to changes in the synaptic 
weights 0,^3- The 7's will play a role in section ITlIBI 

III. DETERMINING THE LEARNING RULE 

As noted in the introduction, many neural networks 
have a tendency to develop into a state where most neu- 
rons are either all active or non-active. Chialvo and Bak 
and Alstr0m and Stassinopoulos solved this prob- 
lem in a biologically less plausible way: the authors do 
not consider the threshold potentials at all Q, or allow 
them to grow indefinitely |4j. We approach the problem 
in a different way: we start from four biologically plau- 
sible restrictions on the learning rule. This leads to the 
conclusion that the learning rule can be seen as a su- 
perposition of two types of learning, Hebbian and Anti- 
Hebbian learning. Taking for the Anti-Hebbian part the 
most simple ansatz one can think of in dependence of the 
state Xi of the postsynaptic neuron i, we arrive at a learn- 
ing rule that, a priori, is biologically plausible. Moreover, 
it turns out that the learning rule found in this way is 
such that the neuronal activity is well-behaved. 

A. Biologically plausible learning rules 

Let us suppose that the weights Wij(t n ) may be 
changed stepwise by some learning process 



where j is the presynaptic and i the postsynaptic neuron. 
The changes, given by AtDy (t n ), are the quantities we are 
after. 

We start from four assumptions, each of which needs 
not be strictly true, but each of which is very plausible 
biologically, at least in first approximation. The con- 
tents of these assumptions generalize the assumptions of 
Heerema and Van Leeuwen [l| by including the effect of 
the feed-back signal r. 

i. The changes in uiy , AtUy , depend on the global 
variable r and the local variables x^, xj, hi, 6i and 
w^, i.e., 

Awij — Awij(r,Xi,Xj,hi,Oi,Wij). (15) 

Biologically this means that only variables that can 
be 'felt' at the synapse between j and i can influ- 
ence the weight change. Of course, this includes the 
global variable r, which could be realized in an ac- 
tual biological system by some chemical substance 
released throughout the brain, and the strength of 
the weight uiij itself. Since the state Xj of neuron 
j determines whether or not neurotransmitters are 
released at the synapse, Xj can influence the weight 
change. Because the synapse is located at the den- 
drites or cell body of neuron i, we suppose that 
variables local to neuron i, the state Xi and the po- 
tentials hi and 9i, also can influence the strength 
of the weight change. 

ii. The sign of Amy depends on r, and the neuron 
states Xi and Xj only, i.e., 

Amy = a(r,Xi,Xj)eij(r,Xi,Xj,hi,6i,Wij) (16) 

where a equals —1 or +1 and where £y > 0. 

Biologically this means that as long as the states 
of the presynaptic and postsynaptic neurons do 
not change and the global feed-back signal r does 
not change, the sign of the weight change will not 
change, i.e., the variables hi, 9i and wy can influ- 
ence the magnitude of the weight change, but they 
cannot switch the learning from an increase to a 
decrease as long as the states and r do not change. 

in. There is only a change in u>y if the presynaptic 
neuron Xj is active, i.e., 

Sij Eij(f ',Xi,hi 1 Qi,Wij)X j . (1^) 

Biologically this means that if the presynaptic neu- 
ron j does not fire, there will be no change in the 
synaptic efficacy my . This is supposed because we 
think that it is unlikely that the synaptic efficacy 
will change if nothing happens, i.e., no neurotrans- 
mitters are released into the synaptic gap. Substi- 
tuting jnj mt0 OS we nn d 



Wij(t n+1 ) = Wij{t n ) + Awij(t n ) 



(14) 



Amy = cr(r,Xi)eij(r,Xi,hi,6i,Wij)xj, (18) 
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where we replaced a(r,Xi, Xj) by cr(r, Xi), which is 
allowed since Amy is only non-zero if Xj equals 1. 

iv. Both in case r = and r = 1, there is not only 
enhancement or only diminishmcnt of the weights. 
This implies that a(r — 0,Xi) must take on both 
the values +1 and —1. The same is true for a(r = 
1,X{). Since Xj is the only variable that is still 
available to influence a, we must have air, x\ = 
0) = — cr(r, x, L = 1), or, equivalently 

a{r, Xi ) =a(r)(2xi - 1) (19) 

where a(r) equals —1 or +1. 

Biologically, this means that we think that it is 
implausible that weights can either only increase 
or only decrease. If the weights would only in- 
crease, all the membrane potentials JQ) would only 
increase, and all neurons would end up firing [see 
eq. 0] . In case the weights would only decrease, all 
membrane potentials would only decrease, and all 
neurons would become non-active. Since r = can 
be true for long times as long as something is not 
learned, and r = 1 can also last for long times as 
long as the behavior is as desired, the assumption 
must hold true both for r — and r = 1. 

Inserting l|19|) into l|18|) we finally have 

Awij = a(r)s i j(r,Xi,h i ,6i,Wij)(2x i - l)xj, (20) 

where <r(r) = ±1 (where r can take on the values and 
1) and £ij > 0. 

B. Implementing reward and punishment 

We will now discuss the effects of the weight changes 
(|2(JII for er(r) = +1 and er(r) = —1. To that end, consider 
neuron i, a fixed but arbitrary neuron of the network. 
Suppose that the neurons j G Vi do not change their 
states during the time step t n — > t n +i' 

Xj(t n+1 ) = Xj(t n ), j G Vi (21) 

Now, we multiply both sides of (|14|) by Xj(t n ), sum over 
all indices j G Vi, and subtract the threshold potential 0i 
to obtain 

hi(t n+ i) -9i = hi(t n ) -Oi+Y, &Wij(tn)xj(t n ) (22) 

where we used and (I21() . Next, we multiply by the 
factor (2xi(t n ) — 1). Using equation (|10f> . we then get 

Ji(t n ,t n+ i) = 'Yi(t n ,t n )+} j Awij(t n )(2xi(t n )-l)xj(t n ). 

(23) 

We now distinguish between a(r) = +1 and er(r) = —1. 



Substituting l(2Tjjl into l(2*3|) with a(r) = +1 we find, 
using (2xi — l) 2 = 1 and x 2 = Xj, that 

li{t n ,t n+1 ) = %(t n ,t n ) + 2J EijXj{t n ). (24) 

Recalling that 7i(t„, t n ) and Sij are positive, we see that 
Ji(t n , t n+ i) is positive. Hence, according to (ITT|> the state 
of neuron i has not changed: 

Xiitn+l) — Xi(t n ^. 

In our case of a feed-forward network, the neurons of 
the hidden layer get their input from neurons of the in- 
put layer only. Consequently, if the input layer does not 
change, going from time t n to time t n +\, the state of a 
neuron i of the hidden layer does not change. This holds 
true for any neuron i of the hidden layer, implying that 
the input of the output layer will not change either. In 
other words, the output of the net does not change when 
the input remains the same, although the weights Wij(t n ) 
change to Wij(t n +i) according to the rule ifHjl. Hence, 
the rule l|2(J|) with a(r) — +1 conserves an input-output 
relation. 

Moreover, since 7i(i n ,t n +i) is larger than 7i(£„,i n ), 
as follows from (|24|l . the new net is more stable. In 
other words, learning with er(r) = +1 engraves an input - 
output relation into the memory of the net by properly 
adapting its weights. 

Next, substituting (|2"0"|) into l(2lfl) . but now with a(r) = 
— 1, we find 

ji(t n ,t n+1 ) = ji(t n ,t n ) - 22 EijXj(t n ). (25) 

Since both 7i(t„,t„) and £y are positive, 7i(t n ,in+i) is 
smaller than 7i(£„, t n ). Hence, the learning rule l|2UI) with 
a(r) = —1 has the effect of decreasing the stability of the 
network. As long as the state of the network does not 
change (i.e. Xj{t n+ i) = Xj{t n ) for all j), all stability co- 
efficients "fi(t n ,t n +i) decrease, and, at a certain moment 
t rn (m > n), ji(t m , t m+ i) will become negative, at least 
for one neuron i, implying, with l|12fl . that 

Xiitrn+l) 7^" Xiitm^j. 

Consequently, repeated learning with cr(r) = —1 will re- 
sult in a change of the output related to the same input. 

We now come to the main conclusion of this section. 
If the network output is the wrong one (r — 0), we must 
adapt the weights such that other output results, i.e., we 
should use tr(0) = — 1. If the network output is the right 
one (r = 1), we should use <r(l) = +1 to consolidate this 
situation. Hence, we may conclude 

o-(r) = 2r - 1. (26) 

With l|2l)|l we may write instead of lf2T]l) 

Amy = rAuE + (1 - r)Aic* (27) 
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where 



IV. SIMULATING THE NETWORK 



Awfj = +ef j (x i ,h i ,9 h w ij )(2x i -l)x j (28) 
= -e^(x i ,hi,9i,w ij )(2xi-l)x j (29) 



Aw, 



with 



_A 



eij(r = l,Xi,hi,6i,Wij) (30) 

Sij(r = 0,Xi,hi,6i,Wij). (31) 

If both the presynaptic and postsynaptic neurons are ac- 
tive (xj = Xi = 1), the terms Ato,^ and Awfj are pos- 
itive and negative, respectively. We will refer to them 
by the names of Hebbian and Anti-Hebbian learning. In 
conclusion, we see from (|27|) that reward (r = 1) and 
punishment (r = 0) may be associated in a unique way 
with Hebbian and Anti-Hebbian learning, respectively. 
A similar rule has been postulated by Barto and Anan- 
dan H3. 

What remains is to find explicit expressions for the 
(positive) functions efj and efj. This will be the subject 
of the next section. 



This section is intended mainly for readers interested 
in technical details of the simulations. 



A. Determining parameter values 

Up to now we did not specify the parameters rji, k, 9i, 
Pi and (Xi occurring in the learning rule (|27|l combined 
with and In 

it is argued that the Hebbian 
learning rate r\i should be proportional to the inverse of 
the average number of neurons j € Vi that fire. It is 
argued also that a reasonable approximation will suffice. 
Therefore, we may choose rji equal to ryx for all i 6 X, 
i.e., we may choose 77.; the same for all neurons i of layer 
X (X = H, O): 



?7h, i e H, 7^=770, ieO 



(37) 



with 



C. Determining explicit rules 

For the Hebbian function ejj we choose the biologically 
plausible expression derived in 0: 



e« = W [K-(hi- 00(23* -1)] 



(32) 



where rji and n are constants. Note that k must be large 
enough in order that ef^ be positive. 

We now come to the Anti-Hebbian function 
efj (xi ,hi,6i, Wij ) . Experimentally nor theoretically there 
are clues regarding the precise form of this term. There- 
fore, we simply choose two (positive) constants, cj and 



0) = c« and ££•(», = 1) 



(2) c . 

c\ . Since any 
(i) 



two positive constants can be expressed as q = pia 



and 



„(2) 



Pi{l — (Xi) where pi and a, are two other con- 



stants with < at < 1 and pi > 0, we can write 

e§(xi =0) = Pica, efAx i = l)=p i {l-a i ) (33) 



or, equivalently, 



p l [|-(a i -§)(2s i -l)] 



(34) 



Note that the expression l|34|) can be obtained from 1321) 
by the substitutions rji — > pi, hi — > cti, k — > h and 0j — > |. 
However, for exists a derivation, whereas (|34|l is an 
educated guess only. 

Upon substituting (J22J and QgJ into and we 
obtain 

Aiz;" = 77, [/c(2a i - 1) - (^ - ( )] ^ 



Aw, 



-Pi(^i 



(35) 
(36) 

The numerical study of section will show that the 
effect of the Anti-Hebbian term l|36|l is that the average 
neuronal activity of the network takes a value controlled 
by the parameter a. 



m = V 



5iJVi(l-d H )' 



no = v 



mN K (i~d ) 



(38) 



where the bar denotes a time average and where 77 is some 
positive constant, which we will call the (global) learning 
rate. Note that 77H is the learning rate associated with 
the connections from the input layer I to the hidden layer 
H. Similarly, 770 is associated with connections from H to 
O. 

Analogously we take for the Anti-Hebbian learning 
rate, or punishment rate, p: 



pa, i G H, pi = po, ieO 



with 



1 



A/iOi(1 - <fe) ' 



Po 



(39) 



(40) 



We will set the parameters 9i = 9x, i € X and a{ = ax, 
i 6 X, i.e., we take these parameters the same also for 
neurons belonging to one and the same layer. 

The margin parameter k will be fixed at the value 1, 
in agreement with literature (see e.g. pj). This can be 
done, because, instead of varying k, one can also vary 
both the learning and punishment rates rj and p, with 
the same effect. 



B. Addition of noise 

Since the equations describing the network dynamics 
are deterministic, and the number of possible states of 
the network is finite, the system may suffer from peri- 
odic behavior, which is not realistic biologically, since in 
an actual biological net there is always some disturbing 
effect. Therefore, when performing our simulations, we 
add some noise, in order to mimic reality. 
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To that end we employ the Gaussian distribution 



F(x,fx,a) 



-(x-p) 2 /2<r 2 



a v 27r 



(41) 



with mean p and standard deviation a. In our simula- 
tions we will replace the deterministic value Amy by the 

probabilistic value Aioj j no "' y ' , the distribution of which is 
given by 

P(Awlf sy) ) = F(Aw^ oisy \ p, a) (42) 



with 



/! = AlVi 



a — AuiijS 



i.e., we replaced Awij by a Gaussian distributed quantity 
At«[" olsy ' with mean value Aw 1:) and standard deviation 
AwijS. Note that we have chosen the standard deviation 
of the noise proportional to Aw^ with proportionality 
constant S. 



C. Initializing the network 

As long as a network does not memorize input-output 
relations, the Anti-Hebbian term of the learning rule 
will adapt the weights in such a way that all neurons 
will 'hesitate' between firing and not firing. When the 
Anti-Hebbian learning rule is applied for a long time, 
the network can be viewed as 'fresh': the network has 
not stored any information and can quickly change its 
behavior. We want to start our simulations with net- 
works that have their weights distributed according to 
such a fresh distribution. This can be accomplished by 
starting with arbitrary weights and applying the Anti- 
Hebbian learning rule a large number of times. We will 
use this approach, and, in every step, offer an arbitrary 
input pattern with activity a\ to the net. 

Since it may take a very long time before the distri- 
bution of all weights has reached its equilibrium by the 
effect of the Anti-Hebbian learning rule, we will not start 
with entirely arbitrary weights, but, instead, start from 
a rough approximation. We choose to initialize with 
weights that are distributed according to the Gaussian 
distribution l|41|) . For weights connecting neurons of the 
input layer to neurons of the hidden layer, the changes 
of the weights due to the Anti-Hebbian learning rule will 
be of the order of pn- Therefore, we suppose that the 
width of the distribution of the weights will also be of 
the order of pn, and we take 



(44) 



for the standard deviation of the initial Gaussian distri- 
bution. For the mean p we will use a value wh > which is a 
rough approximation of the average value of the strength 
of the weights connecting neurons of the input layer to 
neurons of the hidden layer. Since the Anti-Hebbian 



learning rule causes each neuron to change its state x% 
over and over again, the membrane potentials hi will 
fluctuate around the threshold potentials 8i. As an ap- 
proximation we suppose that each membrane potential 
hi of a neuron of the hidden layer will be, on average, 
equal to its threshold potential: 

hi « 6 U , ie H. (45) 

We now approximate hi, given by lJT]l. according to 



E 



(43) The sum in this formula can be approximated by 



Xj « aiNi(l - cZ H ), 



(46) 



(47) 



with of the average activity of the input layer, which, 
in our model, will be equal to the average activity of the 
input patterns. Furthermore, Ni(l— du) is an approxima- 
tion for the number of presynaptic neurons. Substituting 
(fTTjl and (gU into fifty , we obtain a rough approximation 
of the average value of the strengths of the weights Wn: 



(48) 



In the same way, we find for the weights connecting the 
neurons of the hidden layer to the neurons of the output 
layer 



wo ~ — 



ohWh(1 - do) 



(49) 



where oh is the average activity of the hidden layer. As 
will turn out later, this average activity will be around 
the parameter an occurring in the Anti-Hebbian learning 
rule. We will use an as an estimation for an in the above 
formula. Analogous with 1441) . ^po will be used for the 
standard deviation a for the weights connecting neurons 
of the hidden layer to neurons of the output layer. 



V. EFFECT OF THE ANTI-HEBBIAN 
COMPONENT 

It proves useful to study the Hebbian and Anti- 
Hebbian learning rules separately. Since the Hebbian rule 
has been studied earlier, we here consider Anti-Hebbian 
learning; in the next section the complete rule will be 
discussed. 

We will show that the activities, averaged in time, of 
the hidden and output layers are given by «h and ao re- 
spectively. Moreover, we will show that the distribution 
of the activities around the value qh (or ao) is such that, 
effectively, each neuron i in layer H (or O) has a proba- 
bility «h (or ao) to fire, independent of the activities of 
the other neurons. In other words, the proposed learning 
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rule l|36(l focuses the average activity in a natural way 
around the values an and cto- 

In case the threshold potentials 6>h and 60 vanish, we 
will find that the state of a neuron in the output layer 
behaves independently of the states of the other neurons 
of the output layer. In case the threshold potentials do 
not vanish, we will observe strong correlations between 
the states of the output neurons, which vanish, however, 
when the dilution is taken to be sufficiently high. 

A. Vanishing threshold potentials 

We start by studying the case that 6>h = 60 = and 
dn = do = 0. In order to get a first impression of the 
network behavior, we will plot the activity as a function 
of time, as well as, in a histogram, the distribution of its 
values (0 < a < 1). We will offer the network p = 1000 
input patterns. Each input pattern will be repeatedly of- 
fered to the net until the output pattern to be associated 
with the input pattern is found. The input and asso- 
ciated output patterns are chosen at random but with 
a certain specified activity. As soon as the correct out- 
put is produced, the next pattern is presented until all 
output patterns associated with the input patterns have 
been found. 

In figure 

M (top) we can observe that the activity Oh of 
the hidden layer fluctuates around the value an = 0.05, 
the value which we have chosen in the learning rule for 
changes of weights of connections between the input and 
hidden layers. This is what we had hoped to achieve 
when we postulated (|36|1 . The larger fluctuations occur 
when the net is confronted with a new input pattern, to 
which it must learn to react by a new, prescribed output 
pattern. We see that it takes only a short period of time 
before the net has found back its balance. Similar obser- 
vations can be made with respect to the output layer: in 
figure (bottom) the activity is seen to fluctuate around 
the value ao — 0.3, chosen in the learning rule (|36JI . 
It can also be seen that the distributions closely resem- 
ble the distribution Q(a) [eq. (jUJ)]. This means that the 
Anti-Hebbian learning rule effectively causes each neu- 
ron to behave and fire, with a probability ax, as if it is 
independent of the states of the other neurons. 



B. Non— vanishing threshold potentials 

In this section we study a network that is the same as 
in the preceding section, but with non-vanishing thresh- 
old potentials: we will take #h = #0 = 1- When we now 
repeat the simulations, we find one important difference: 
the neurons of the output layer turn out to exhibit strong 
correlations: almost always, either all of them are active 
or all of them are non-active. This can be seen in figure 
13 (left). This is alarming and most undesirable, since it 
would imply that the network has, effectively, only two 
different output states. Note, that although the proba- 



bility that ao equals ao now is very small, its average 
ao is still close to ao- 

It can be expected that if the average number of neu- 
rons of some layer X# that are postsynaptic to a neuron 
of a preceding layer X^i becomes lower, the correlated 
behavior of the neurons of X^, which, apparently, was 
induced by the neurons of X^, will decrease. In order to 
verify this, we now study a network with non-vanishing 
dilution. In figure (right), we plotted the distribution 
of the activity ao of the output layer for five different 
values of the dilution do- Comparing the left and right 
pictures of figure we observe a dramatic change in the 
behavior indeed: the undesired effect quickly diminishes 
with increasing dilution. 

Note that the distribution is shifted somewhat towards 
zero in comparison with the distribution of uncorrelated 
output neurons. The shift decreases when the number of 
neurons of the hidden layer increases: for ATjj = 20, 000 
there is a better resemblance to the Q(a)-curve than for 
iVjj = 2, 000. For non-zero dilution the probability that 
none of the neurons that are presynaptic to an output 
neuron fire increases, resulting in a lower average activity. 
This causes the shift of the activity distribution to lower 
values. 



VI. TESTING THE FULL LEARNING RULE 

In the preceding section we saw that the Anti-Hebbian 
part of the learning rule adapts the weights in such a 
way that a desired input-output relation is found after a 
number of time steps, while, at the same time the network 
activities an and ao stay within acceptable bounds. In 
this section we study the complete learning rule (|27|) with 
(EU and Pty. 

A. Comparing to other models 

In order to make contact with existing literature, we 
also simulate systems with extremal dynamics. We con- 
sider input-output relations with the same number of 
active neurons — Nq — in the input and 

output patterns. In figure 0] we plotted the performance 
R of a network as a function of the number of input - 
output relations p to be learned, for A/w = 1, = 2 
and = 3. The two pictures at the top are for the 

case of extremal dynamics, the pictures at the bottom 
correspond to simulations for the more realistic case that 
the neurons fire if their membrane potentials exceed their 
thresholds. The two pictures at the left are without re- 
ward (77 = 0), those at the right are with reward (77 ^ 0). 

We conclude that the net is able to learn a number of 
input-output relations indeed. The situation = 1 

(top, left) is analogous to the case considered by Chialvo 
and Bak. In the simple case that = 1, the perfor- 
mance is almost perfect, i.e., R is close to 1. However, 
the performance decreases quickly for the case that more 
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FIG. 2: The activities of the hidden layer (top) and the output layer (bottom) as a function of time (left) , and their distributions 
(right). The network has Ni = 20 neurons in the input layer, Nh = 2000 neurons in the hidden layer and No = 10 neurons 
in the output layer. All neuron thresholds vanish: 8u — &o — 0. The dilutions are zero: da = do = 0. The parameters in the 
learning rule are: learning rate rj = 0, punishing rate p — 0.01, an = 0.05 and ao = 0.3. The noise parameter is 8 — 0.1. The 
number of patterns is p — 1000. The number of active neurons in the input patterns was = 3. The left pictures show 

only a small interval (500 time steps) of the total number of 429,919 time steps needed to find all desired output patterns. 
The activities an and ao are seen to wiggle around the values an and ao (left pictures, dashed lines). The distribution of the 
activities (right pictures, bars) have a distribution resembling the distribution Q(a) [eq. ijjjj] (right figures, dashed lines). 




FIG. 3: The distribution of the activity of the output layer for 9h = Go = 1 and du — 0; other parameters are taken the same 
as m figure H The neurons of the output layer are seen to fire strongly correlated in case do = (left, bars): they almost 
always either all are active (ao = 1) or quiescent (ao = 0). If no correlations at all would be present, the dashed line of the left 
figure would be found for the distribution of the neuron activity. This dashed line is given analytically by Q(a) [eq. ©]. When 
do increases (right), the correlations of the activities of the output neurons are seen to decrease. Around the value do = 0.9 
the activity distribution resembles the distribution Q(a) the most. The resemblance becomes better if the number of hidden 
neurons, Nh, increases. 
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FIG. 4: Comparing realistic (bottom) and extremal dynamics (top) with (right) and without (left) a rewarding component 
in the learning rule. The performance is measured as a function of the number of input-output relations p. We have chosen 
Ni = No = 10, iV H = 2000, N$ a) = N Q a) = 1, 2, or 3, 9n = 0o = 0, d H = do = 0, p = 0.01, r) = (left pictures), 77 = 0.02 
(right pictures), a H = 0.025, a Q = N Q a) /No and 5 = 0.1. 



than two neurons are active in the input and output pat- 
terns. In figure 0] (top, right) we observe that rewarding 
significantly increases the performance R if N W is larger 
than 1. A similar observation has been made earlier by 
Bosnian et al. 2J. 

Figure 0] (left, bottom) is to be compared with fig- 
ure 21 (left, top). For = 1, the performance is not 
as good as in case of extremal dynamics, but it still 
works satisfactory. However, for values of iV^ a ' larger 
than one, the performance is very bad. Adding reward 
to the learning rule, we find the results of figure 0] (right, 
bottom). The improvement of the performance is impres- 
sive. Hence, our model performs the task of realizing pre- 
scribed input-output relations reasonably well, although 
its performance is not as good as in the less realistic case 
that extremal dynamics is used. 

We close this section with the following observation: 
In case only one input-output relation is to be learned 
by the net, i.e., p = 1, the performance R is close to 1, 
as can be read off from figure This implies that the 
period of search for the correct output by means of the 
Anti-Hebbian term is close to the random search time. In 
other words, in case the feed-back is binary only (7' = 
or r = 1), Anti-Hebbian learning enables a way of search 



which is close to optimal. 



B. Numerical experiments on the influence of 
parameters 

The performance of a neural net depends on many pa- 
rameters, e.g., the coefficients 77, p and a occurring in 
the learning rule, the dilution d and the threshold po- 
tential 9. In this section we take the thresholds #x = 
(X = I,H, O). In order to get more insight regarding 
the behavior of the net in dependence of variations of all 
these parameters, we will study two particular cases. 

In figureElwe plotted the performance R of the net as a 
function of the parameter an, the coefficient occurring in 
the Anti-Hebbian part of the learning rule, which deter- 
mines the activity of the hidden layer. There is seen to be 
an optimum in the performance. For values of an below 
0.02, the performance decreases rapidly. It follows that 
the parameter an occurring in the Anti-Hebbian part of 
the learning rule, a neuron property, has an important 
effect on the performance of the net. It is seen that the 
performance diminishes when the activity becomes too 
large, in agreement with earlier results 0. 



11 



0.5 




0.02 0.04 0.06 0.08 0.1 

FIG. 5: Performance R as a function of the parameter an 
determining the activity of the hidden layer for p — 10 input- 
output relations. We set Nx = No = 10, iV H = 2000, N^ a) = 
N Q a) = 2, H = 6>o = 0, d H = do = 0, p = 0.1, r) = 0.2, 
S = 0.1. The performance is best for values of hh between 
0.02 and 0.06. 
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FIG. 7: Performance of a network with #h = do = 1 as a 
function of the number of input-output relations p for differ- 
ent values of the dilution do: do = 0, do = 0.5, do = 0.9. We 
chose QfH = 0.025. Other parameters are the same as those in 
figure^] The network performance is best in this picture for 
do = 0.9. 




C. Non— vanishing threshold potentials 

In view of section IV Bl we expect that also in case of 
use of the full learning rule (|27|) with Ij35(l and we 
will only get satisfactory results if we take the dilution d 
unequal to zero. In figured the performance is plotted 
as a function of the number of input-output relations 
p for different values of the dilution do- The network 
performance is optimal around do = 0.9. The fact that 
a dilution here actually enhances the performance is a 
consequence of the fact that undesired correlations (see 
section rVB(l diminish with increasing dilution. 



FIG. 6: Performance as a function of the ratio of reward 
and punishment for p = 10 input-output relations. We chose 
qh = 0.025, p — 0.05 and r/ was varied between 0.05 and 0.5. 
Other parameters where chosen the same as in the simulations 
of figure|H| For values of r//p around 2, the performance seems 
optimal. 



In figure [S] we plotted the performance R as a func- 
tion of rj/p for p = 0.05. For values of rj around two 
times the value of p, the performance is optimal. We 
conclude that when the rewarding part of the learning 
rule (proportional to 77) becomes smaller and smaller, the 
performance of the net strongly decreases. On the other 
hand, if the rewarding part becomes larger and larger, the 
performance also diminishes, albeit more slowly. These 
observations do not come as a surprise: evidently, if the 
memorizing Hebbian term is too weak - relative to the 
Anti-Hebbian term - learning will be slow, whereas it 
will also be slow if the Hebbian term becomes too strong, 
since then the learning of any pattern will change the ex- 
isting connections too wildly. 



VII. CONCLUSIONS 

Our goal to model, in a biologically plausible way, neu- 
ral networks capable of learning without the use of ex- 
tremal dynamics [3| or some other mechanism to control 
the neuronal activity was reached successfully. 

In section llTl Al we found, on the basis of four biological 
assumptions, two possible learning rules, given by the 
equations l|28|l and l|29|) . By analyzing their effects, we 
were able to associate them in a unique way with reward 
and punishment, and to formulate a plausible learning 
rule (j23). For the rewarding part, we used a form derived 
in [l| . for the punishing part we chose two constants [see 
eq. (|33() ]. By studying the effects of this specific form 
with two constants, we found that this punishing part of 
our learning rule is able to control the average activity 
(average fraction of firing neurons) in a neuronal net. 
We showed that the activity remained around a desired 
level while, at the same time, the network is searching for 
the correct output by generating random patterns with 
a desired activity with great efficiency. 

Finally, we showed that for neurons with non-zero 
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threshold potentials, the neural net must be diluted in 
order to achieve a reasonable performance: dilution is 
found to enhance the functionality of a biological neural 
net. This is because the neurons start to behave corre- 
lated when the threshold potentials are non-zero. When 
the dilution increases, these undesired correlations de- 
crease. In nature neural nets normally are strongly di- 
luted: the human brain, for example, has of the order 
of 10 11 neurons but there are only 10 4 -10 5 synapses per 
neuron. 



VIII. OUTLOOK 

Evidently, the study on the behavior of biologically 
inspired neural networks is far from complete. 

One may, for instance, include more biologically known 
features into the model like time delay of the axonal sig- 
nal or the refractory period of a neuron. Also, the ar- 
chitecture of the net could be made more realistic by 
including more layers and adding feed-back and lateral 
connections. Furthermore, the model of the surround- 
ing world, i.e., the input-output relations to be learned 
could be made more realistic, for example by using a dy- 
namic model of a realistically changing world. Also, the 
neuron model could be made to resemble closer real neu- 
rons by including different types of neurons and synapses, 
or using the fact that excitatory synapses are probably 
more plastic than the inhibitory ones. Another extension 
would be to refine the measure of success. Instead of just 
a binary feed-back signal indicating whether the output 
is right or wrong in reaction to some input, a feed-back 
signal that can take on a range of values is possible. Also, 



a relative measure could be used, telling the network if 
it has performed better or worse than during a previous 
attempt. 

A most obvious first extension of our particular model 
would be to include (inhibitory) lateral connections in- 
side layers and possibly also feed-back connections. It 
would be interesting to study the influence of these non- 
feed-forward connections on the behavior of the network, 
and, especially, the effect they will have on the (aver- 
age) activity of the network. Another extension of our 
model could be to associate Hebbian and Anti-Hebbian 
learning with different types of neurons (or synapses) 
instead of letting the same neurons behave differently 
under different conditions of success or failure. Such a 
model could have the advantage that the Hebbian con- 
nections, in which the input-output relations are memo- 
rized on success, would not have to be changed on fail- 
ure. Thus, instead of changing the same connections, dif- 
ferent, Anti-Hebbian connections could possibly do the 
job of searching for successful output patterns, without 
changing the already learned input-output relations. 

As a final remark, we mention that different extensions 
or alterations are possible to enhance the performance of 
the network, which may, however, be implausible biolog- 
ically. For example, if an input-output relation would 
not be strengthened over and over again once it has been 
learned, the network generally would perform better. 
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